AITopics | Vermilion River County

Transformers have achieved great success in numerous NLP tasks but continue to exhibit notable gaps in multi-step factual reasoning, especially when real-world knowledge is sparse. Recent advances in grokking have demonstrated that neural networks can transition from memorizing to perfectly generalizing once they detect underlying logical patterns - yet these studies have primarily used small, synthetic tasks. In this paper, for the first time, we extend grokking to real-world factual data and address the challenge of dataset sparsity by augmenting existing knowledge graphs with carefully designed synthetic data to raise the ratio $ϕ_r$ of inferred facts to atomic facts above the threshold required for grokking. Surprisingly, we find that even factually incorrect synthetic data can strengthen emergent reasoning circuits rather than degrade accuracy, as it forces the model to rely on relational structure rather than memorization. When evaluated on multi-hop reasoning benchmarks, our approach achieves up to 95-100% accuracy on 2WikiMultiHopQA - substantially improving over strong baselines and matching or exceeding current state-of-the-art results. We further provide an in-depth analysis of how increasing $ϕ_r$ drives the formation of generalizing circuits inside Transformers. Our findings suggest that grokking-based data augmentation can unlock implicit multi-hop reasoning capabilities, opening the door to more robust and interpretable factual reasoning in large-scale language models.

large language model, machine learning, real-world multi-hop reasoning, (15 more...)

arXiv.org Artificial Intelligence

2504.20752

Country:

Europe > France (0.05)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Africa > Rwanda > Kigali > Kigali (0.04)
(11 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Media > Film (0.67)
Leisure & Entertainment (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.48)

Add feedback

Image Clustering Conditioned on Text Criteria

Kwon, Sehyun, Park, Jaeseung, Kim, Minkyu, Cho, Jaewoong, Ryu, Ernest K., Lee, Kangwook

arXiv.org Artificial IntelligenceNov-29-2023

Classical clustering methods do not provide users with direct control of the clustering results, and the clustering results may not be consistent with the relevant criterion that a user has in mind. In this work, we present a new methodology for performing image clustering based on user-specified text criteria by leveraging modern vision-language models and large language models. We call our method Image Clustering Conditioned on Text Criteria (IC|TC), and it represents a different paradigm of image clustering. IC|TC requires a minimal and practical degree of human intervention and grants the user significant control over the clustering results in return. Our experiments show that IC|TC can effectively cluster images with various criteria, such as human action, physical location, or the person's mood, while significantly outperforming baselines.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2310.18297

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > Canada > Alberta > Census Division No. 10 > Vermilion River County (0.04)
North America > Canada > Alberta > Census Division No. 10 > Two Hills County No. 21 (0.04)
(2 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Media > Music (1.00)
Leisure & Entertainment > Sports (0.93)
Transportation > Ground > Road (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(2 more...)

Add feedback

Autoconj: Recognizing and Exploiting Conjugacy Without a Domain-Specific Language

Hoffman, Matthew D.

Neural Information Processing SystemsDec-31-2018

Deriving conditional and marginal distributions using conjugacy relationships can be time consuming and error prone. In this paper, we propose a strategy for automating such derivations. Unlike previous systems which focus on relationships between pairs of random variables, our system (which we call Autoconj) operates directly on Python functions that compute log-joint distribution functions. Autoconj provides support for conjugacy-exploiting algorithms in any Python-embedded PPL. This paves the way for accelerating development of novel inference algorithms and structure-exploiting modeling strategies. The package can be downloaded at https://github.com/google-research/autoconj.

artificial intelligence, machine learning, programming language, (20 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.06)
North America > United States > California > Alameda County > Berkeley (0.04)
North America > Canada > Quebec > Montreal (0.04)
(2 more...)

Industry: Energy (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Software > Programming Languages (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Autoconj: Recognizing and Exploiting Conjugacy Without a Domain-Specific Language

Hoffman, Matthew D.

Neural Information Processing SystemsDec-31-2018

Deriving conditional and marginal distributions using conjugacy relationships can be time consuming and error prone. In this paper, we propose a strategy for automating such derivations. Unlike previous systems which focus on relationships between pairs of random variables, our system (which we call Autoconj) operates directly on Python functions that compute log-joint distribution functions. Autoconj provides support for conjugacy-exploiting algorithms in any Python-embedded PPL. This paves the way for accelerating development of novel inference algorithms and structure-exploiting modeling strategies. The package can be downloaded at https://github.com/google-research/autoconj.

artificial intelligence, machine learning, programming language, (19 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > California > Alameda County > Berkeley (0.04)
North America > Canada > Quebec > Montreal (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Software > Programming Languages (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Autoconj: Recognizing and Exploiting Conjugacy Without a Domain-Specific Language

Hoffman, Matthew D., Johnson, Matthew J., Tran, Dustin

arXiv.org Machine LearningNov-28-2018

Deriving conditional and marginal distributions using conjugacy relationships can be time consuming and error prone. In this paper, we propose a strategy for automating such derivations. Unlike previous systems which focus on relationships between pairs of random variables, our system (which we call Autoconj) operates directly on Python functions that compute log-joint distribution functions. Autoconj provides support for conjugacy-exploiting algorithms in any Python-embedded PPL. This paves the way for accelerating development of novel inference algorithms and structure-exploiting modeling strategies.

artificial intelligence, machine learning, programming language, (20 more...)

arXiv.org Machine Learning

1811.11926

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > California > Alameda County > Berkeley (0.04)
North America > Canada > Quebec > Montreal (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry: Energy (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Software > Programming Languages (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Semi-Supervised Learning Using Sparse Eigenfunction Bases

Sinha, Kaushik (Ohio State University) | Belkin, Mikhail (Ohio State University)

AAAI ConferencesNov-3-2009

We present a new framework for semi-supervised learning with sparse eigenfunction bases of kernel matrices. It turns out that when the cluster assumption holds, that is, when the high density regions are sufﬁciently separated by low density valleys, each high density area corresponds to a unique representative eigenvector. Linear combination of such eigenvectors (or, more precisely, of their Nystrom extensions) provide good candidates for good classiﬁcation functions. By ﬁrst choosing an appropriate basis of these eigenvectors from unlabeled data and then using labeled data with Lasso to select a classiﬁer in the span of these eigenvectors, we obtain a classiﬁer, which has a very sparse representation in this basis. Importantly, the sparsity appears naturally from the cluster assumption. Experimental results on a number of real-world datasets show that our method is competitive with the state of the art semi-supervised learning algorithms and out-performs the natural base-line algorithm (Lasso in the Kernel PCA basis).

artificial intelligence, eigenfunction, machine learning, (19 more...)

AAAI Conferences

2009 AAAI Fall Symposium Series

Country:

North America > United States > Ohio > Franklin County > Columbus (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada > Alberta > Census Division No. 10 > Vermilion River County (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback

Collaborating Authors

Vermilion River County

Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers

Image Clustering Conditioned on Text Criteria

Autoconj: Recognizing and Exploiting Conjugacy Without a Domain-Specific Language

Autoconj: Recognizing and Exploiting Conjugacy Without a Domain-Specific Language

Autoconj: Recognizing and Exploiting Conjugacy Without a Domain-Specific Language

Semi-Supervised Learning Using Sparse Eigenfunction Bases